library(ggoncoplot)
library(dplyr, warn.conflicts = FALSE)Input Data
The input for ggoncoplot is a data.frame with 1 row per
mutation. data.frame must contain columns describing the
following:
Gene Symbol
Sample Identifier
(optional) mutation type
(optional) tooltip (character string: what we show on mouse hover over a particular mutation)
These columns can be in any order, and named anything. You define the mapping of your input dataset columns to the required features in the call to ggoncoplot
Minimal Example
# TCGA GBM dataset from TCGAmuations package
gbm_csv <- system.file(package='ggoncoplot', "testdata/GBM_tcgamutations_mc3_maf.csv.gz")
gbm_df <- read.csv(file = gbm_csv, header=TRUE)
gbm_df |>
ggoncoplot(
col_genes = 'Hugo_Symbol',
col_samples = 'Tumor_Sample_Barcode'
)Colour by mutation type
Colour by mutation by specifying col_mutation_type
# TCGA GBM dataset from TCGAmuations package
gbm_csv <- system.file(package='ggoncoplot', "testdata/GBM_tcgamutations_mc3_maf.csv.gz")
gbm_df <- read.csv(file = gbm_csv, header=TRUE)
gbm_df |>
ggoncoplot(
col_genes = 'Hugo_Symbol',
col_samples = 'Tumor_Sample_Barcode',
col_mutation_type = 'Variant_Classification'
)
#>
#> ── Identify Class ──
#>
#> ℹ Found 7 unique mutation types in input set
#> ℹ 0/7 mutation types were valid PAVE terms
#> ℹ 0/7 mutation types were valid SO terms
#> ℹ 7/7 mutation types were valid MAF terms
#> ✔ Mutation Types are described using valid MAF terms ... using MAF paleteControl which genes are shown
Show top [n] Genes
Show the 4 most frequently mutated genes using topn
argument
gbm_df |>
ggoncoplot(
col_genes = 'Hugo_Symbol',
col_samples = 'Tumor_Sample_Barcode',
col_mutation_type = 'Variant_Classification',
topn = 4
)
#>
#> ── Identify Class ──
#>
#> ℹ Found 7 unique mutation types in input set
#> ℹ 0/7 mutation types were valid PAVE terms
#> ℹ 0/7 mutation types were valid SO terms
#> ℹ 7/7 mutation types were valid MAF terms
#> ✔ Mutation Types are described using valid MAF terms ... using MAF paleteExclude Specific Genes
Use the genes_to_ignore argument to filter out specific
genes, such as TTN and MUC16.
gbm_df |>
ggoncoplot(
col_genes = 'Hugo_Symbol',
col_samples = 'Tumor_Sample_Barcode',
col_mutation_type = 'Variant_Classification',
topn = 10,
genes_to_ignore = c("TTN", "MUC16")
)
#>
#> ── Identify Class ──
#>
#> ℹ Found 9 unique mutation types in input set
#> ℹ 0/9 mutation types were valid PAVE terms
#> ℹ 0/9 mutation types were valid SO terms
#> ℹ 9/9 mutation types were valid MAF terms
#> ✔ Mutation Types are described using valid MAF terms ... using MAF paleteYou can use packages like somaticflags to get lists of genes you might want to filter out.
Gene Subset
lets only show TP53 and TERT
gbm_df |>
ggoncoplot(
col_genes = 'Hugo_Symbol',
col_samples = 'Tumor_Sample_Barcode',
col_mutation_type = 'Variant_Classification',
genes_to_include = c('TP53', 'TERT'),
)
#>
#> ── Identify Class ──
#>
#> ℹ Found 6 unique mutation types in input set
#> ℹ 0/6 mutation types were valid PAVE terms
#> ℹ 0/6 mutation types were valid SO terms
#> ℹ 6/6 mutation types were valid MAF terms
#> ✔ Mutation Types are described using valid MAF terms ... using MAF paleteControl what samples are shown
The show_all_samples argument will add samples that
don’t have mutations in the selected genes to the plot.
gbm_df |>
ggoncoplot(
col_genes = 'Hugo_Symbol',
col_samples = 'Tumor_Sample_Barcode',
col_mutation_type = 'Variant_Classification',
show_all_samples = TRUE
)
#>
#> ── Identify Class ──
#>
#> ℹ Found 7 unique mutation types in input set
#> ℹ 0/7 mutation types were valid PAVE terms
#> ℹ 0/7 mutation types were valid SO terms
#> ℹ 7/7 mutation types were valid MAF terms
#> ✔ Mutation Types are described using valid MAF terms ... using MAF paleteNote that if you supply a metadata table, by default samples lacking
ANY mutations at all will still not be shown. You can include these
samples by setting metadata_require_mutations = FALSE but
this isn’t recommended unless you’re sure the sample truly has no
mutations at all in the dataframe.
Custom Tooltip
Use the col_tooltip argument to indicate which column of
your input dataframe should be used as a custom tooltip.
gbm_df |>
mutate(tooltip = paste0(Chromosome, ":", Start_Position, " ", Reference_Allele, ">", Tumor_Seq_Allele2)) |>
ggoncoplot(
col_genes = 'Hugo_Symbol',
col_samples = 'Tumor_Sample_Barcode',
col_mutation_type = 'Variant_Classification',
col_tooltip = 'tooltip' # We'll specify a custom tooltip based on our new 'tooltip' column
)
#>
#> ── Identify Class ──
#>
#> ℹ Found 7 unique mutation types in input set
#> ℹ 0/7 mutation types were valid PAVE terms
#> ℹ 0/7 mutation types were valid SO terms
#> ℹ 7/7 mutation types were valid MAF terms
#> ✔ Mutation Types are described using valid MAF terms ... using MAF paleteNote tooltips are html, so if you want to insert a break, just paste
in <br>.
Similarly, if you want to make text in the tooltip bold, try
"<b>text_to_bold<\b>".
Note that where a single sample has multiple mutations in a gene, are represented as one tile in oncoplot, tooltip for each mutation are shown (newline delimited).
Add Pathway Annotations
We can also add pathway information to the oncoplot by supplying a simple 2-column data.frame.
Currently the default order of pathways and genes in the plot are based on their order of appearnce in the pathway data.frame. Future versions of ggoncoplot will support data-based sorting. Any genes missing from the oncoplot will be displayed under an ‘Other’ pathway at the very bottom of the plot.
path_pathways <- system.file("testdata/GBM_tcgamutations_mc3.pathways.csv", package = "ggoncoplot")
pathways_df <- read.csv(path_pathways)
gbm_df |>
ggoncoplot(
col_genes = 'Hugo_Symbol',
col_samples = 'Tumor_Sample_Barcode',
col_mutation_type = 'Variant_Classification',
pathway = pathways_df
)
#> Found pathway column: Pathway
#> Warning: 7 unknown levels in `f`: Cytoskeleton structure, Nuclear envelope structure,
#> Muscle structure, Endocytosis, Extracellular matrix, Ciliary function, and
#> Neurotransmitter release
#>
#> ── Identify Class ──
#>
#> ℹ Found 7 unique mutation types in input set
#> ℹ 0/7 mutation types were valid PAVE terms
#> ℹ 0/7 mutation types were valid SO terms
#> ℹ 7/7 mutation types were valid MAF terms
#> ✔ Mutation Types are described using valid MAF terms ... using MAF paleteAdd margin plots
Gene Barplot
How many samples have mutations in each Gene (optionally coloured by mutation type)
gbm_df |>
ggoncoplot(
col_genes = 'Hugo_Symbol',
col_samples = 'Tumor_Sample_Barcode',
col_mutation_type = 'Variant_Classification',
draw_gene_barplot = TRUE
)
#>
#> ── Identify Class ──
#>
#> ℹ Found 7 unique mutation types in input set
#> ℹ 0/7 mutation types were valid PAVE terms
#> ℹ 0/7 mutation types were valid SO terms
#> ℹ 7/7 mutation types were valid MAF terms
#> ✔ Mutation Types are described using valid MAF terms ... using MAF paleteTumour Mutation Burden
You can use set draw_tmb_barplot = TRUE to plot the
total number of mutations (total mutational burden) in each sample. In
most datasets, the presence of one hypermutator will makes it hard to
see less extreme trends, and so by defualt mutational burden is plotted
on a log10 scale. This can be changed by setting
log10_transform_tmb = FALSE
gbm_df |>
ggoncoplot(
col_genes = 'Hugo_Symbol',
col_samples = 'Tumor_Sample_Barcode',
col_mutation_type = 'Variant_Classification',
draw_tmb_barplot = TRUE,
#log10_transform_tmb = FALSE
)
#>
#> ── Identify Class ──
#>
#> ℹ Found 7 unique mutation types in input set
#> ℹ 0/7 mutation types were valid PAVE terms
#> ℹ 0/7 mutation types were valid SO terms
#> ℹ 7/7 mutation types were valid MAF terms
#> ✔ Mutation Types are described using valid MAF terms ... using MAF palete
#> ! TMB plot: Ignoring `col_mutation_type` since `log10_transform = TRUE`.
#> This is because you cannot accurately plot stacked bars on a logarithmic scaleAdd both TMB and and Gene Barplots
Usually, we’ll want to draw both margin plots (tmb + gene).
gbm_df |>
ggoncoplot(
col_genes = 'Hugo_Symbol',
col_samples = 'Tumor_Sample_Barcode',
col_mutation_type = 'Variant_Classification',
draw_tmb_barplot = TRUE,
#log10_transform_tmb = FALSE
)
#>
#> ── Identify Class ──
#>
#> ℹ Found 7 unique mutation types in input set
#> ℹ 0/7 mutation types were valid PAVE terms
#> ℹ 0/7 mutation types were valid SO terms
#> ℹ 7/7 mutation types were valid MAF terms
#> ✔ Mutation Types are described using valid MAF terms ... using MAF palete
#> ! TMB plot: Ignoring `col_mutation_type` since `log10_transform = TRUE`.
#> This is because you cannot accurately plot stacked bars on a logarithmic scaleAdd clinical annotations
gbm_clinical_csv <- system.file(
package = "ggoncoplot",
"testdata/GBM_tcgamutations_mc3_clinical.csv"
)
gbm_clinical_df <- read.csv(file = gbm_clinical_csv, header = TRUE)
ggoncoplot(
gbm_df,
col_genes = "Hugo_Symbol",
col_samples = "Tumor_Sample_Barcode",
col_mutation_type = "Variant_Classification",
metadata = gbm_clinical_df,
cols_to_plot_metadata = c('gender', 'histological_type', 'prior_glioma', 'tumor_tissue_site'),
draw_tmb_barplot = TRUE,
draw_gene_barplot = TRUE,
show_all_samples = TRUE
)
#> ℹ 2 samples with metadata have no mutations. Fitering these out
#> ℹ To keep these samples, set `metadata_require_mutations = FALSE`. To view them in the oncoplot ensure you additionally set `show_all_samples = TRUE`
#> → TCGA-06-0165-01A-01D-1491-08
#> → TCGA-06-0167-01A-01D-1491-08
#>
#> ── Identify Class ──
#>
#> ℹ Found 7 unique mutation types in input set
#> ℹ 0/7 mutation types were valid PAVE terms
#> ℹ 0/7 mutation types were valid SO terms
#> ℹ 7/7 mutation types were valid MAF terms
#> ✔ Mutation Types are described using valid MAF terms ... using MAF palete
#> ! TMB plot: Ignoring `col_mutation_type` since `log10_transform = TRUE`.
#> This is because you cannot accurately plot stacked bars on a logarithmic scaleStatic Plots
gbm_df |>
ggoncoplot(
col_genes = 'Hugo_Symbol',
col_samples = 'Tumor_Sample_Barcode',
col_mutation_type = 'Variant_Classification',
interactive = FALSE
)
#>
#> ── Identify Class ──
#>
#> ℹ Found 7 unique mutation types in input set
#> ℹ 0/7 mutation types were valid PAVE terms
#> ℹ 0/7 mutation types were valid SO terms
#> ℹ 7/7 mutation types were valid MAF terms
#> ✔ Mutation Types are described using valid MAF terms ... using MAF palete
